Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NNCF]: Add INT8 weight compression conformance test for Tinyllama-1.1b PyTorch model #2636

Merged
merged 41 commits into from
May 2, 2024

Conversation

AdiKsOnDev
Copy link
Contributor

@AdiKsOnDev AdiKsOnDev commented Apr 16, 2024

Changes

  • Added the INT8 compression test suite to the model_scope
  • Added TORCH backend support in LMWeightCompression class
  • For INT8 compression, dataset, as well as some other parameters (see model_scope) are set to None
  • metric_value has been set to 0.95944
  • Mainly use save_pretrained() for TORCH models
  • Omitted a few method calls that are not supported for TORCH models (Check the commits for details)

Reason for changes

Requested to Benchmark changes via whowhatbench in issue #2527

Related tickets

ref: 130788
Closes #2527

Tests

  • Added INT8 weight compression conformance test for Tinyllama-1.1b PyTorch model

AdiKsOnDev and others added 3 commits April 9, 2024 14:47
@AdiKsOnDev AdiKsOnDev requested a review from a team as a code owner April 16, 2024 15:09
@github-actions github-actions bot added the NNCF PTQ Pull requests that updates NNCF PTQ label Apr 16, 2024
@AdiKsOnDev
Copy link
Contributor Author

AdiKsOnDev commented Apr 16, 2024

@alexsu52 Requesting review as per @MaximProshin 's guideline
Unfortunately, I can't assign the PR myself because I have no necessary rights

@MaximProshin MaximProshin requested a review from alexsu52 April 16, 2024 15:29
Copy link

codecov bot commented Apr 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 29.95%. Comparing base (9c00000) to head (e5db8cc).
Report is 1 commits behind head on develop.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##           develop    #2636       +/-   ##
============================================
- Coverage    91.21%   29.95%   -61.26%     
============================================
  Files          494      494               
  Lines        45775    45775               
============================================
- Hits         41753    13713    -28040     
- Misses        4022    32062    +28040     

see 330 files with indirect coverage changes

Flag Coverage Δ
COMMON ?
ONNX ?
OPENVINO ?
TENSORFLOW 29.95% <ø> (ø)
TORCH ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
common 76.35% <ø> (-17.42%) ⬇️
torch 0.01% <ø> (-93.61%) ⬇️
tensorflow 93.74% <ø> (ø)
onnx 0.00% <ø> (-93.07%) ⬇️
openvino 0.00% <ø> (-94.23%) ⬇️
ptq 15.26% <ø> (-74.91%) ⬇️

@AdiKsOnDev
Copy link
Contributor Author

@alexsu52 I think I fixed the code, could you please approve the workflow?

Copy link
Contributor

@alexsu52 alexsu52 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check that you have pushed all the changes because I don't see the changes you mentioned in the description.

Please, provide local results of the test run.

@AdiKsOnDev
Copy link
Contributor Author

Please check that you have pushed all the changes because I don't see the changes you mentioned in the description.

Good evening. Sorry, those initial changes are irrelevant. I changed the code a bit because the initial code was not passing the pipeline.

@AdiKsOnDev
Copy link
Contributor Author

Please, provide local results of the test run.

Will send screenshots in a bit

@AdiKsOnDev
Copy link
Contributor Author

AdiKsOnDev commented Apr 19, 2024

@alexsu52

Command

pytest tests/post_training/test_quantize_conformance.py::test_weight_compression -s --data=tests/post_training/data/ -k tinyllama_int8_data_free_backend_PT

Output

image

@AdiKsOnDev AdiKsOnDev requested a review from alexsu52 April 19, 2024 16:14
@alexsu52
Copy link
Contributor

@alexsu52

Command

pytest tests/post_training/test_quantize_conformance.py::test_weight_compression -s --data=tests/post_training/data/ -k tinyllama_int8_data_free_backend_PT

Output

image

Your command does not run any test. There are 0 selected tests in your screenshot. I run your test using the following command:

 pytest tests/post_training/test_quantize_conformance.py::test_weight_compression -s --data=tests/post_training/data/ -k tinyllama_int8_data_free_backend_TORCH

and got the following error:
image

Copy link
Contributor

@alexsu52 alexsu52 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tinyllama_int8_data_free_backend_TORCH test failed with runtime error.

The general comment: Please add a description of the test case you implemented to the PR description.

@AdiKsOnDev
Copy link
Contributor Author

@alexsu52 Oh, I just used the command provided in the issue, I'll fix it and tag you asap. Thanks for the feedback

@AdiKsOnDev AdiKsOnDev requested a review from alexsu52 April 22, 2024 21:06
@AdiKsOnDev
Copy link
Contributor Author

AdiKsOnDev commented Apr 22, 2024

@alexsu52 Hi, could you verify the logic I followed? Looks good so far. You can check the updated PR description for a heads-up

Copy link
Contributor

@alexsu52 alexsu52 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the similar test pipeline for PyTorch model in PTQ test:

def test_ptq_quantization(

You need to reproduce the same pipeline for weight compression test.

tests/post_training/pipelines/lm_weight_compression.py Outdated Show resolved Hide resolved
@AdiKsOnDev
Copy link
Contributor Author

We have the similar test pipeline for PyTorch model in PTQ test:

def test_ptq_quantization(

You need to reproduce the same pipeline for weight compression test.

Will do

@AdiKsOnDev
Copy link
Contributor Author

@alexsu52 Good morning, any updates? Do you need me to help you with anything?

@alexsu52
Copy link
Contributor

alexsu52 commented May 1, 2024

@alexsu52 Good morning, any updates? Do you need me to help you with anything?

Hi, I tried to run your PR and got a runtime error. I'll come back with comments after I've done some experiments.

@AdiKsOnDev
Copy link
Contributor Author

AdiKsOnDev commented May 1, 2024

@alexsu52 Good morning, any updates? Do you need me to help you with anything?

Hi, I tried to run your PR and got a runtime error. I'll come back with comments after I've done some experiments.

Oh that's weird. I'll try to run it as well

@AdiKsOnDev
Copy link
Contributor Author

@alexsu52 I removed the problematic line of code, I must have returned it by accidentally undo-ing right before the commit

@AdiKsOnDev
Copy link
Contributor Author

@alexsu52 did it run?

Copy link
Contributor

@alexsu52 alexsu52 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand, based on your implementation, you are trying to implement the following test flow:

  1. Create FP32 PyTorch HF model
  2. Export FP32 PyTorch HF model to FP32 OpenVINO model and save to fp32_model_dir
  3. Compress FP32 PyTorch HF model to INT8
  4. Export INT8 PyTorch HF model to INT8 OpenVINO model and save to output_model_dir
  5. Calculate the number of int8 and int4 operations by INT8 OpenVINO model.
  6. Check the number of in8 and int4 operations with references.
  7. Calculate the similarity metric between FP32 OpenVINO model and INT8 OpenVINO model. The similarity metric is calculated between OpenVINO models for inference optimization on CPU.
  8. Check the similarity metric with reference.

Is this correct statement?

tests/post_training/data/wc_reference_data.yaml Outdated Show resolved Hide resolved
tests/post_training/data/wc_reference_data.yaml Outdated Show resolved Hide resolved
tests/post_training/pipelines/lm_weight_compression.py Outdated Show resolved Hide resolved
tests/post_training/pipelines/lm_weight_compression.py Outdated Show resolved Hide resolved
tests/post_training/pipelines/lm_weight_compression.py Outdated Show resolved Hide resolved
tests/post_training/pipelines/lm_weight_compression.py Outdated Show resolved Hide resolved
tests/post_training/pipelines/lm_weight_compression.py Outdated Show resolved Hide resolved
tests/post_training/pipelines/lm_weight_compression.py Outdated Show resolved Hide resolved
@alexsu52
Copy link
Contributor

alexsu52 commented May 2, 2024

@alexsu52 did it run?

Thanks for your update. Please pay attention to my comments.

@AdiKsOnDev
Copy link
Contributor Author

@alexsu52 did it run?

Thanks for your update. Please pay attention to my comments.

Good day! Yup, thanks for the review

@AdiKsOnDev
Copy link
Contributor Author

AdiKsOnDev commented May 2, 2024

As I understand, based on your implementation, you are trying to implement the following test flow:

  1. Create FP32 PyTorch HF model
  2. Export FP32 PyTorch HF model to FP32 OpenVINO model and save to fp32_model_dir
  3. Compress FP32 PyTorch HF model to INT8
  4. Export INT8 PyTorch HF model to INT8 OpenVINO model and save to output_model_dir
  5. Calculate the number of int8 and int4 operations by INT8 OpenVINO model.
  6. Check the number of in8 and int4 operations with references.
  7. Calculate the similarity metric between FP32 OpenVINO model and INT8 OpenVINO model. The similarity metric is calculated between OpenVINO models for inference optimization on CPU.
  8. Check the similarity metric with reference.

Is this correct statement?

Yes, except in the Step 4. I didn't export it as OpenVINO model

AdiKsOnDev and others added 7 commits May 2, 2024 14:57
Also deleted the following class attributes:
MODEL_NAME
MODEL_FUNC
Utilization of export_from_model() function from Optimum
Co-authored-by: Alexander Suslov <[email protected]>
@AdiKsOnDev AdiKsOnDev requested a review from alexsu52 May 2, 2024 11:23
@AdiKsOnDev
Copy link
Contributor Author

@alexsu52 Thanks for a detailed review, I implemented all the changes and cleaned up the remaining code, could you review it one last time?

@AdiKsOnDev
Copy link
Contributor Author

@alexsu52 FYI, here's the output:
image

@alexsu52
Copy link
Contributor

alexsu52 commented May 2, 2024

@alexsu52 FYI, here's the output: image

I have run internal build for validation your changes. It takes some time.

@AdiKsOnDev
Copy link
Contributor Author

AdiKsOnDev commented May 2, 2024

@alexsu52 FYI, here's the output: image

I have run internal build for validation your changes. It takes some time.

Yup I see it, should be done in a few minutes

UPD: @alexsu52 Pipeline Passed

Copy link
Contributor

@alexsu52 alexsu52 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the contribution!

build: manual/job/post_training_weight_compression/57

@alexsu52 alexsu52 merged commit ba7e1a4 into openvinotoolkit:develop May 2, 2024
12 checks passed
@AdiKsOnDev
Copy link
Contributor Author

Thanks for guidance, have a great day

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NNCF PTQ Pull requests that updates NNCF PTQ
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Good First Issue][NNCF]: Add INT8 weight compression conformance test for Tinyllama-1.1b PyTorch model
2 participants